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Abstract. Statisticai node ciustering in discrete time dynamic networks is an emerging fieid that 
raises many chaiienges. Here, we expiore statisticai properties and frequentist inference in a modei 
that combines a stochastic biock modei (SBM) for its static part with independent Markov chains for the 
evoiution of the nodes groups through time. We modei binary data as weil as weighted dynamic ran¬ 
dom graphs (with discrete or continuous edges vaiues). Our approach, motivated by the importance 
of controiling for labei switching issues across the different time steps, focuses on detecting groups 
characterized by a stabie within group connectivity behavior. We study identifiabiiity of the modei pa¬ 
rameters, propose an inference procedure based on a variationai expectation maximization aigorithm 
as weii as a modei seiection criterion to seiect for the number of groups. We carefuiiy discuss our 
initiaiization strategy which piays an important roie in the method and compare our procedure with 
existing ones on synthetic datasets. We aiso iiiustrate our approach on dynamic contact networks, one 
of encounters among high schooi students and two others on animai interactions. An impiementation 
of the method is avaiiabie as a R package caiied dynsbm. 

Keywords: contact network, dynamic random graph, graph ciustering, stochastic biock modei, varia¬ 
tionai expectation maximization 


1. Introduction 


Statistical network analysis has become a major field of research, with applications as diverse as 
sociology, ecology, biology, internet, etc. General r eferences on statistical mo deling of random 
graph s incl ude the r ecent book by iKolaczvd ( 2009I) and the two reviews by Goldenberg et al 


( 201Clf l and ISniidersI ( 2flllh . While static approaches have been developed as early as in the 60’s 
(mostly in the field of sociology), the literature concerning dynamic models is much more recent. 
Modeli ng discr e te tim e dynamic networks is an emerging field that raises many challenges and we 
refer to iHolm^ ( 2015[) for a most recent review. 

An important part of the literature on static network analysis is dedicated to clustering meth¬ 
ods, with both aims of taking into account the intrinsic heterogeneity of the data and summarizing 
this data through node classification. Among clustering approaches, community detection methods 
form a smaller class of methods that aim at finding groups of highly connected nodes. Our focus 
here is not only on community detection but more generally on node classificati on based on connec¬ 
tivity behaviors, with a particular interest on model-based approaches (see e.g. iMatias and Robinl . 
l2014h . When considering a sequence of snapshots of a network at different time steps, these static 
clustering approaches will give rise to classifications that are difficult to compare through time and 
thus difficult to interpret. An important thing to note is that label switching between two succes¬ 
sive time steps may not be solved without an extra assumption e.g. that most of the nodes do not 
change group across two different time steps. However to our knowledge, this kind of assumption 
has never been discussed in the literature. In this work, we are interested in statistical models 
for discrete time dynamic random graphs, with the aim of providing a node classification varying 
with time, while controlling for label switching issues across the different time steps. Our answer 
to this challenge will be to focus on the detection of groups characterized by a stable within group 
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connectivity behavior. We believe that this is particularly suited for dynamic contact networks. 


Stochastic block models (SBM) form a widely used class of statistical (and static) random 
graphs models that provide a clustering of the nodes. SBM introduces latent (i.e. unobserved) 
random variables on the nodes of the graph, taking values in a finite set. These latent variables 
represent the nodes groups and interaction between two nodes is governed by these corresponding 
groups. The model includes (but is not restricted to) the specific case of community detection, 
where within groups connections have higher probability than across groups ones. Combining SBM 
with a Markov structure on the latent part of the process (the nodes classification) is a natural way 
of ensuring a smooth evo l ution of the groups across time and has already been considered in the 
literature. In Yang et ^ ( 201 ill , the authors consider undirected, either binary or finitely valued, 
discrete time dynamic random graphs. The static aspect of the data is handled through SBM, 
while its dynamic aspect is as follows. For each node, its group membership forms a Markov chain, 
independent of the values of the other nodes memberships. There, only the group membership is 
allowed to vary across time while connectivity parameters among groups stay constant through 
time. The authors propose a method to infer these parameters (either online or offlin e), based on a 
combi nation of Gibbs sampling and simulated annealing. For binary random graphs. IXu and Herol 
( 2014 i propose to introduce a state-space model through time on (the logit transform of) the 
probability of connection between groups. Contrarily to the previous work, both group membership 
and connectivity parameters across groups may vary through time. As such, we will see below that 
this model has a strong identifiability problem. Their (online) iterative estimation procedure is 
based on alternating two steps: a label-switching method to explore the space of node groups 
configuration, and the (extended) Kalman f ilter that o p timiz es th e likelihood when th e groups 
memberships are known. Note that neither IVang et ( 201111 nor IXu and Herol (|2ni4l l propose 
to infer th e number of clusters. Bayesian variants of these dynamic SB models may be found for 
instance in Ishiguro et al. ( 201Clll : Herlau et al. ( 2013 1. 

Surprisingly, we noticed that the above mentioned methods were evaluated on synthetic datasets 
in terms of averaged value over the time steps of a clustering quality index computed at fixed time 
step. Naturally, those indexes do not penalize for label switching and two classihcations that are 
identical up to a permutation have the highest quality index value. Computing an index for each 
time step, the label switching issue between different time steps disappears and the classification 
task becomes easier. Indeed, such criteria do not control for a smoothed recovery of groups along 
different time points. It should also be noted that the synthetic experiments from these works were 
performed under the dynamic version of the binary affiliation SBM, which has non identifiable 
parameters. The affiliation SBM, also known as planted partition model, corresponds to the case 
where the connectivity parameter matrix has only two different values: a diagonal one that drives 
within groups connections and an off-diagonal one for across groups connections. In particular, the 
label switching issue between different time steps may not be easily removed in this particular case. 


Other approaches for model-based clustering of dynamic random graphs do not rely directly on 
SBM but rather on variants of SBM. We mention the random subgraph model (RSM) that combines 
SBM with the a priori knowledge of a nodes partition (inducing subgraphs), by authorizing the 
groups proportions to differ in the differe nt subgrap hs. A dynamic ver sion of RSM that builds 
upon the approach of IXu and Herd (2014h appear s in IZreik et al.l ( 20151 1. Detection of persistent 
communities has been proposed in Liu et al.l ( 2014l l for directed and dynamic graphs of call counts 
between individuals. Here the static underlying model is a time and degree-corrected SBM with 
Poisson distribution on the call counts. Groups memberships are independent through time instead 
of Markov, but smoothness in the classification is obtained by imposing that within groups expected 
call volumes are constant through time. Inference is performed through a heuristic greedy search 
in the space of groups memberships. Note that only real datasets and no synthetic experiments 
have been explored in this latter work. 

Another very popular statistical method for analyzing static networks is based on latent space 
models. Each node is associated to a po int in a latent spa ce a nd probability of connect ion is higher 
for nodes whose latent points are closer ( Hoff et al] . l2002ll . In Sarkar and Moor j ( 2005ll . a dynamic 
version of the latent space model is proposed, where the latent points follow a (continuous state 
space) Markov chain, with transition kernel given by a Gaussian perturbation on current position 
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with zero mean and small variance. Latent position inference is performed in two steps: a first 
initial guess is obtained through multi dimension al scaling. Then, nonl inear optimization is used 
to maximize the model likelihood. Th e work by Xu and Zheng (1200911 is very similar, adding a 
clustering step on the nodes. Finally, Heaukulani and Ghahramani ( 2013l l rely on Monte Carlo 
Markov Chain methods to perform a Bayesian inference in a more complicate setup where the 
latent positions of the nodes are not independent. 


Mixe d membersh i p mo dels (jAiroldi et al.l . 120081) are also explored in a dynamic context. The 


work by IXing et ^ ( 2010ll relies on a state space model for the evolution of the parameters of 
the priors of both the mixed membership vector of a node and the connectivity behavior. Infer- 
ence is carried out through a variational Bayes expectation maximisation (VBEM) algorithm (e.g. 


Jordan et ah . 19991 ). 


This non exhaustive bibliography on model-based clustering methods for dynamic random 
graphs shows both the importance and the huge interest in the topic. 


In the present work, we explore statistical properties and frequentist inference in a model that 
combines SBM for its static part with independent Markov chains for the evolution of the nodes 
groups through time. Our approach aims at achieving both i n terpretability and st atistical accu¬ 
racy. Our setup is very close to the ones of Yang et ^ ( 201 ill : Xu and Herol ( 2014 ). the first and 
main difference being that we allow for both groups memberships and connectivity parameters to 
vary through time. By focusing on groups characterized by a stable within group connectivity 
behavior, we are able to ensu re parameter identifiability and thus valid statistical inference. In¬ 
deed, whil^ ^n^^^^J^OllI) use the strong constraint of fixed connectivity parameters through 
time, IXu and Hero ( 2014h entirely relax this assumption at the (not acknowledged) cost of a label 
switching issue between time steps. Second, we model binary data as well as weighted random 
graphs, should they be dense or sparse, with discrete or continuous edges. Third, we propose 
a model selection criterion to choose the number of clusters. To simplify notation, we restrict 
our model to undirected random graphs with no self-loops but easy generalizations would handle 
directed datasets and/or including self-loops. 

The manuscript is organized as follows. Section [^?T] describes the model and sets notation. Sec¬ 
tion 12.21 gives intuition on the identifiability issues raised by authorizing both group memberships 
and co nnectivity parameters to freely vary with time. This was not pointed out bv IXu and Hero 
1 2014h despite they worked in this context. The section motivates our focus on groups character¬ 
ized by a stable within group connectivity behavior. Section [2.31 then establishes our identifiability 
results. To our knowledge, it is the first dynamic random graph model where parameters identifia¬ 
bility (up to label switching) is discussed and established. Then, Section [3] describes a variational 
expectation maximization (VEM) procedure for inferring the model parameters and clustering the 
nodes. The VEM proc edure works with a fi xed number of groups and an Integrated Classifica¬ 
tion Likelihood (ICL, iBiernacki et ah . 200nf ) criterion is proposed for estimating the number of 
groups. We also discuss initialization of the algorithm - an important but rarely discussed step, 
in Section 13.21 Synthetic experiments are presented in Section U) There, we discuss classification 
performances without neglecting the label switching issue that may occur between time steps. In 
Section [SI we illustrate our approach with the analysis of real-life contact networks: a dataset of 
encounters among high school students and two other datasets of animal interactions. We believe 
that our model is particularly suited to handle this type of data. We mention that the methods are 
implemented into a R package available at http: //Ibbe .univ-lyonl. fr/dynsbm and will be soon 
available on the CRAN. Supplementary Materials (available at the end of this article) complete 
the main manuscript. 


2. Setup and notation 

2.1. Model description 

We consider weighted interactions between N individuals recorded through time in a set of data 
matrices Y = (Y*)i<t<T- Here T is the number of time points and for each value t G {I,... ,T}, 
the adjacency matrix Y* = (Yj* )i<i^j<Ar contains real values measuring interactions between 
individuals i,j € {I,..., N}"^. Without loss of generality, we consider undirected random graphs 




















































4 C. Mafias and V. Miele 


without self-loops, so that Y* is a symmetric matrix with no diagonal elements. 

We assume that the N individuals are split into Q latent (unobserved) groups that may vary 
through time, as encoded by the random variables Z = {Zl)i<t<T.i<i<N with values in := 

Q}^^- This process is modeled as follows. Across individuals, random variables (^i)i<i<Ar 
are independent and identically distributed (iid). Now, for each individual i G {!,...,TV}, the 
process Zi = (Zf)i<:t<T is an irreducible, aperiodic stationary Markov chain with transition matrix 
TT = ('^qq')i<q,q'<Q initial stationary distribution a = (ai,..., ag). When no confusion occurs, 
we may alternatively consider Z- as a value in Q or as a random vector Z- = (Z-j^, ..., Z-q) G 


{ 0 , 1}‘5 constrained to Z-^ = 1 . 

Given latent groups Z, the time varying random graphs Y = (Y*)ict<T are independent, the 
conditional distribution of each V* depending only on Z*. Then, for fixed 1 < t < T, random 
graph y* follows a stochastic block model. In other words, for each time t, conditional on Z*, 
random variables {Y*j)i<i^j<]s[ are independent and the distribution of each only depends on 
Z*, Zi. For now, we assume a very general parametric form for this distribution on R.. Follow¬ 
ing l^^tooisTanT^atiai ( 2012 1 . in order to take into account possible sparse weighted graphs, we 
explicitly introduce a Dirac mass at 0, denoted by Sq, as a component of this distribution. More 
precisely, we assume 


yA|{y44 = 1} ^ (1 - /3‘,)do(-) + 


( 1 ) 


where {F{-, 7 ), 7 G F} is a parametric family of distributions with no point mass at 0 and densities 
(with respect to Lebesgue or counting measure) denoted by /(^y). This could be the Gaussian 
family with unknown mean and variance, the truncated Poisson family on N \ {0} (leading to a 
0-inflated or 0-deflated distribution on the edges of the graph), a finite space distribution on M 
values (a case which comprises nonparametric approximations of continuous distributions through 
discretization into a finite number of M bins), etc. Note that the binary case is encompassed 
in this setup with F{-,j) = di(-), namely the parametric family of laws is reduced to a single 
point, the Dirac mass at 1 and conditional distribution of Fj* is simply a Bernoulli S(/3*;). In 
the following and by opposition to the ’binary case’, we will call ’weighted case’ any setup where 
the set of distributions F is parametrized and not reduced to a single point. Here, the sparsity 
parameters /?* = il3qi)i<q,i<Q satisfy /?*; G [0,1], with /3‘ = I corresponding to the particular case 
of a complete weighted graph. As a result of considering undirected graphs, the parameters 7 ^; 
moreover satisfy and 7 *; = 7 *^ for all 1 < g, 1 < Q. Note that for the moment, SBM 

parameters may be different across time points. We will go back to this point in the next sections. 
The model is thus parameterized by 


9 = (7r,/3,7) = (7r,{/3‘,7‘}l<t<T) = i{'^qq'}l<q,q'<Q, {l^lh'lll}l<t<T,l<q<l<Q) G 0, 

and we let Pg denote the probability distribution on the whole space x R^. We also let 7 ) 

denote the density of the distribution given by o, namely 


Vy G R, ti(y; /3, 7 ) = (1 - I3)l{y = 0} -k Pf{y, j)l{y ^ 0}, 


where 1{A} is the indicator function of set A. With some abuse of notation and when no confusion 
occurs, we shorten ?!>(•;/Jj;, 7 ^;) to </>*;(•) or </>*;(■; 6 >). Directed acyclic graphs (DAGs) describing 
the dependency structure of the variables in the model with different levels of detail are given in 
Figure[T] Note that the model assumes that the individuals are present at any time in the dataset. 
An extension that covers for the case where some nodes are not present at every time point is given 
in Section |E] from the Supplementary Materials and used in analyzing the animal datasets from 
Section 15.21 


2.2. Varying connectivity parameters vs varying group membership 

In this section, we give some intuition on why it is not possible to let both connectivity parameters 
and group membership vary through time without entering into label switching issues between 
time steps. To this aim, let us consider the toy example from Figure [5J 

This figure shows a graph between iV = 12 nodes at two different time points Node 1 is 

a hub (namely a highly connected node), nodes 2 to 6 form a community at time ti (they tend to 
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Figure 1. Dependency structures of the model. Top: general view corresponding to hidden Markov model 
(HMM) structure; Middle: details on latent structure organization corresponding to N different ild Markov 
chains Zi = {Zl)i<t<T across individuals; Bottom: details for fixed time point t corresponding to SBM 
structure. 


form a clique) and are peripheral nodes at time ^2 and finally nodes 7 to 12 are peripheral at time 
ti while becoming a community at time t 2 . In observing those two graphs (without the clusters 
indicated by the nodes shading), there are at least two possible statistical interpretations relying 
on a clustering with <5 = 3 groups. The first (illustrated in Figure [2]) is to consider that the 3 
different groups at stake respectively are: hubs (in white), a community (light grey) and peripheral 
nodes (dark grey) and that the nodes 2 to 6 change group from a community to peripheral group 
between time ti and t 2 while nodes 7 to 12 change from peripheral group to a community between 
those same time points (node 1 stays a hub, in white, for both time points). Another point of 
view would rather to consider that nodes 2 to 6 stayed in the same group that was organized as 
a community at time ti and is now characterized by peripheral behavior at time t 2 , while nodes 
7 to 12 also stayed in the same group, behaving peripherally at time ti and now as a community 
at time t 2 - Obviously none of these two interpretations is better than the other. Without adding 
constraints on the model, the label switching phenomenon will randomly output one of these two 
interpretations (clustering at time ^2 is the same when permuting light grey and dark grey colors). 
In this context, it is thus impossible to recover groups memberships trajectories. We formalize 
these ideas in the next section through the concept of parameter identifiability. 

The main problem with the previous example comes from the possibility of arbitrarily relabeling 
the groups between two time steps. We mentioned in the introduction that a natural idea would 
be that most of the individuals should not change groups between successive time steps. However 
note that imposing constraints on the transition matrix tt (e.g. it has large diagonal elements) 
is useless because estimation would then be unfeasible. Indeed, without imposing zero values on 
the off diagonal elements of tt (i.e. Zl does not depend on t), it can happen that there is no 
labelling of the groups that ensures most individuals stay in the same group. Thus it is not always 
possible to label the groups so that between two successive time steps, estimation of the transition 
parameters would be constrained to have large diagonal elements. Thus we choose to focus our 
attention on groups characterized through their stable within group con nectivity parame ter. This 


choice is reminiscent of works on detection of persistent communities (ILiu et al.l . 120141) . except 
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t — ti 


t^t2 




Figure 2. Connectivity parameters or group membership variation: a toy exampie. 


that we do not restrict our attention to communities (i.e. groups of highly connected individuals). 
In the above toy example, this corresponds to the first interpretation rather than the second. 
Other choices could be made and we believe that this one is particularly suited to model social 
networks or contact data where the groups are defined as structures exhibiting a stable within 
group connectivity behavior and individuals may change groups through time (see Section [5] for 
applications on real datasets). 


2.3. Parameters identifiability 

Let us recall that with discrete latent random variables, identifiability can only be obtained up to 
a label switching on the node groups Q. For any permutation a in ©q (the set of permutations 
on Q) and any 0 S 0, we define 

a{e) := {{T^a(q)<T(q')}l<q,q'<Q, {l3l(q)a(l) nl{q)a{l)}l<t<T,l<q<l<Q) ■ 

It should be noted that here, the permutation a acts globally, meaning that it is the same at 
each time point t. Now, if we let denote the marginal of Pe on the set of observations Y, 
identifiability of the parameterization, up to label switching means 

V6I,0g 0, P^ =pF ^ 3aG ©q,6 » = ct(0). 

Without additional constraints on the transition matrix tt or on the parameters (/3, 7 ), the param¬ 
eters may not be recovered up to label switching. However, it could be that the static SBM part 
of the parameter is recovered up to a local label switching. Local label switching on SBM part of 
the parameter is the weaker following property 

V0,0G0, Fj = Fj ^ 3ai,...,aTG65,V<,(/3‘,7‘)=at(/3‘,f). 

This property is not satisfactory since clustering in models that only satisfy a local identifiability 
of SBM part of the parameter prevents from obtaining a picture of the evolution of the groups 
across time. 

A formal example of the fact that if both and (/ 3 ‘, 7 ‘) may vary through time, then the 
parameter can not be identified up to label switching without additional constraints is given in 
Section El from Supplementary Materials. We stress that this example implies that dynamic 
affiliation SBM (or planted partition model) does not have identifiable parameters and groups 
may not be recovered consistently across time. This is an important point as previous authors 
have tried to recover groups from this type of synthetic datasets and evaluated their estimated 
classification in a non natural way. 

As a consequence and following the ideas developed in Section 12.21 we choose to impose the 
following constraints on the parameter 9 


( Binary case: := Pqq, 

\ Weighted case: 7 *, = 7 *^ := 7 ,,. 


yqGQ,yt,t' G{1,...,T}, 


( 2 ) 
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Under the above condition, we focus on groups characterized by a stable within group connectivity 
behavior (f3qq or jqq is constant with time). Note that this constraint could in fact be weakened 
as follows 


Vg eQ,vt,t'e r},3Ze T}, 


Binary case: 

Weighted case: 7 *; = 7 *;. 


(3) 


In this latter condition, the group I that helps characterizing the group q between the two dif¬ 
ferent time points t,t' may depend on Such a constraint may be useful if groups are not 

characterized by a stable within group connectivity but rather by their connectivity to at least one 
specific other group. Note that for estimation purposes, this group I needs to be known in advance 
(for each g, t, t') which requires a more complex a priori modeling of the data. In the following, 
we choose to restrict our attention to constraint ([ 2 ]) only but our theoretical results remain valid 
under We prove below that these constraints, combined with the same conditions used for 
identifiability in the static case, are sufficient to ensure identifiability of the parametrization in our 
dynamic setup. 


Assumption 1 (Weighted case). We assume that 

i) For any t>\, the Q{Q + l )/2 values {7*;, ^ < q < I < Q} are distinct, 

a) The family of distributions J-' = {/(•,7),7 € F} is such that all elements /(•,7) have no point 
mass at 0 and the parameters of finite mixtures of distributions in T are identifiable, up to 
label switching. 


A ssumption [J is the co ndition that ensures identifiability of static weighted SBM (see Theorem 12 
in I Allman et al.l . l 2011 1 . Note that it does not impose any constraint on the sparsity parameters /3*; 
in the weighted case. In particular and for parsimony reasons, these may be chosen identical (to 
some P* or some constant /3) or set to two different values, e.g. /3*g = and /3*j = /3out whenever 
g Z at each time point (or even constant with time). 


Proposition 1. Considering the distribution Fj on the set of observations and assuming the 
constraint 0, the parameter 6 = (7r,/3,7) satisfies the following: 

• Binary case: 0 is generically identified from Pj, up to label switching, as soon as N is not 
too small with respect to Q, 

• Weighted case: Under additional Assumption[l\ the parameter 9 is identified from P"g , up 
to label switching, as soon as N > 3. 


Generic identifiab ility means ’up to excludin g a subset of zero Lebesgue measure of the parameter 
set’. We refer to Allman et al. ( 20091 l201l[ l for more details. In particular for the binary case, 
assuming that the matrix of Bernoulli parameters jd has distinct rows is a generic constraint 
(meaning that it removes a subset of zero Lebesgue measure of the parameter set). As we do not 
specify the whole generic constraint that is needed here, we do not stress that one either. But the 
reader should have it in mind in the binary setup. Finally, note that the condition on the number of 
nodes N being not too small in the binary case is given precisely in Theorem 2 from lAllman et al.l 
(|20111) . The particular affiliation case (planted partition) is not covered by these results and further 
discussed in Section [B] from Supplementary Materials. 


Proof. The proof combines the approaches_o£jLeroiDd (1992) for proving identifiability of 
hidden Markov models (HMM) parameters and I Allman et al.l (1201111 that studies identifiability for 
(static) SBM. 

First, we fix a time point t > 1 and consider the marginal distribution Pe(y*). According to 
T heorems 1,2 (b i nary case with Q = 2 and Q > 3, respectively) and Theorem 12 (weighted case) 


Allman et al 


(12011!) on parameters identifiability in static SBM, there exists a permutation at 
on the group labels Q such that we can identify (, 5 ‘, 7 *) as well as the marginal distribution o^, up 
to this permutation. This result stands generically in the binary case only. 
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Now, for two different time points t,t', we use the constraint and the assumption of distinct 
parameter values in order to identify the parameters {(/3*, 7 *), t > 1 } up to a (common) permutation 
(T on Q. Indeed, in the binary case, assuming that the within groups Bernoulli parameters satisfy 
'^99 “ Pgg 1 < 9 ^ Q} contains Q distinct values (a generic constraint) 

suffices to obtain a global permutation a, not depending on time t, up to which {(/ 3 ‘, 7 ‘),< > 1 } 
are identified. The same applies in the weighted case, by assuming equality between the parameter 
llg = Iqq for any t, t'. 

It remains to id entify the tran sition matrix tt (up to the same permutation tr). We fix an edge 
(f,j) and following iLeroux ( 1992 ). consider the bivariate distribution This is given 

by 


nmxn = 


qi:q2,h,h&Q 


)• 


(4) 


Note that iTeicheil ( 1967 1 has proved the equivalence between parameters identifiability of the 
mixtures of a family of distributions and parameters identifiability of the mixtures of finite products 
from this same family. For the sake of clarity, we develop his proof adapted to our context. We 
thus write 


q2,h€Q qi,li€Q 


As the mixtures from the family 1 < 9 < ^ < Q} have identifiable parameters (Assumption[Tl 

ii)), we can identify the mixing distribution 


'y ' Q!9l‘^7^9l92'^7i2'/'gi/i (bi:/))'^(/3*+i 

q2,h&Q qi.heQ ^ ^ 


’'92^2 


(•)■ 


Now, applying again this identifiability at time t and constraint dm, we may identify the whole 
mixing distribution 


E E ^91 ^7^9192^7^2 

92,76Q 9i>7eQ 


7i7 


)(•) ® ,^‘+1 ) 
'■^92^2 '92 1^2 ' 


(•)• 


This proves that the mixture given by ([4]) has identifiable components. From this mixture and 
the fact that we already identified the parameters (/3,7) up to a global permutation, we may 
extract the set of coefficients 1 < 9 , 9 ' < Q} that corresponds to the components 4'qq4’lfq' 

in 0 . As we also already obtained the values {aq, 1 < 9 < Q}, this now identifies the parameters 
{nqqi, 1 < q,q' < Q}. This concludes the proof. 


3. Inference algorithm 


3.1. General description 

As usual with latent variables, the log-likelihood logPe(Y) contains a sum over all possible latent 
configurations Z and thus may not be computed except for sma ll values of N and T. A classical 
solution is to rely on expectation-maximisation (EM) algorithm ( Dempster et al.l . Il977ll . an itera¬ 
tive procedure that finds local maxima of the log-likelihood. The use of EM algorithm relies on the 
computation of the conditional distribution of the latent variables Z given the observed ones Y. 
However in the context of stochastic block model, this distribution has not a factored form and 
thus may not be computed efficiently. A classical so l ution is to rely on variational approximations 
of EM algorithm (VEM, see for instance Jordan et al.L 1999[). These approximations have been first 
proposed in the context o f SBM in DandirL.et_alJ "( 2008ll and later de veloped in many direc tions. 


such as online proced ures (iZanghi et ah . 
refer to the review bv iMatias and RobinI 


2008 


20 loll or Bayesian VEM ( Latouche et al'l 2012h . We 


201J) for more details about VEM algorithm (in particular 


a presentation of EM viewed as a special instance of VEM) and its comparison to other estimation 
procedur es in SBM. Note that convergenc e properties of VEM algorithms are discussed in full gen ¬ 
erality in Guna wardana and Bvrn3 (200^ and in the special case of SBM in Celisse et al. ( 20121) : 
Bickel et al.l (20131. 
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VEM for dynamic SBM. In our context of dynamic random graphs, we start by writing the complete 
data log-likelihood of the model 

N Q T N 

logPe(Y,Z)=^£z4loga, + ^^ ^ Zl-^Zl^AogTT,,, 

i—1 q—1 t—2 i—1 l<q,q'<Q 

T 

+ E E E (5) 

t=l l<i<j<N l<q,l<Q 

We now explore the dependency structure of the conditional distribution Pg(Z|Y). First, note that 
it can be easily deduced from the DAG of the model (Figure [U top) that 

T 

Fg{Z\Y)=Fg{Z^\Y^)l[Fg{Z*\Z*-\Y*). 

t^2 

However, the distribution Fe{Z*\Z*~^,Y*) = Fe{{Zl)i<i<]^\Z^~^,Y*) can not be further factored. 
Indeed, for any i j, the variables Zl,Z* are not independent when conditioned on Y*. Our 
variational approximation naturally considers the following class of probability distributions Q := 
Qt parameterized by t 


N 


N 


WZ) =l[^r{Z,) = l[Qr{Zr)l[Qr{Zl\Zl-^) 

i—1 i—1 t—2 

N Q T 


2=1 g=l 


t—2 


where for any values (t, i, q, q'), we have r(i, q) and t (<, i, q, q') both belong to the set [0,1] and are 
constrained by '^qT{i,q) = I and = 1- This class of probability distributions Qt- 

corresponds to considering independent laws through individuals, while for each i G A^}, 

the distribution of Zi under Qt- is the one of a Markov chain (through time t), with inhomogeneous 
transition T{t,i,q,q') = Qr(^| = = <?) and initial distribution T{i,q) = QriZ} = q). 

We will need the marginal components of Qi-, namely Tmarg(i,*,9) := QriZf = q). These 
quantities are computed recursively by 


Q 

T'marg(l,b<3') = T{i,q) and Vt > 2, Tme,rgit,i,q) = E '^marg(t “ I, i, i, q' , Q) ■ 

q' = l 

Note also that all these values Ti„arg(i, *, (?) depend on the initial distribution T{i,q). Entropy of 
Qt is denoted by 'H(Qi-)- Using this class of probability distributions on VEM algorithm is an 
iterative procedure to optimize the following criterion 

J(0,r) :=EQ^(logPe(Y,Z))+V7(Q.) 

N Q 

=EE Tii,q)[logaq - logT{i,q)] 

i—1 q—1 
T N 

+ EE E '^rna.rgit- l,i,q)rit,i,q,q')[^OgTTqq, -\ogT{t,i,q,q')] 

t—2 i=l 
T 

+ E E E log(fliiY*^). (6) 

t=l l<i<j<N l<q,l<Q 

It consists in iterating the following two steps. At fc-th iteration, with current parameter value 
(r«, 6 »W), we do 

• VE-step: Compute = Argmax^ J( 0 U) ^ . 7 -)^ 
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• M-step: Compute = Argmaxg J(0, 

Proposition 2. The value f that maximizes in t the function J{9,t) satisfies the fixed point 
equation 

N Q 

Vt > 2,Vi > l,^q,q S Q, T{t,i,q,q) oc tt,,/ \ 

j=ii'=i 

where oc means ’proportional to’ (the constants are obtained by the constraints on t). Moreover, 
the value (tt,/?) that maximizes in (7r,/3) the function J{9,t) satisfies 

T N 

TTqq' « ^ ^ Tmargit - 1, i, q)T{t, i, q, q'), 
t^2 

Si 1 'Tmargit fi^q)Tmarg{ti 

_ ’’■J 

^‘\j 'Tmarait, i, q)Xmara(t,j, 1) 

~ X/t ^^i,j Tmargit, i, q)Tmargit, j, q)^Yf^^0 

Tmargit, i, q)Tmargit,j, q) 

The proof of this result is immediate and omitted. Note that we have given a formula with 
constant (through time) values fiqq for any group q € Q. While this assumption is an identifiability 
requirement in the binary setup, it is not necessary in the weighted case. In this latter case, we 
use it only for parsimony reasons. The corresponding formula when this parameter is not assumed 
to be constant may be easily obtained. 

To complete the algorithm’s description, we provide equations to update the parameters r(i, q) 
and aq of initial distributions as well as the connectivity parameter 7 . First, optimization of J(0, r) 
with respect to the initialization parameters Tii,q) is a little bit more involved. By neglecting the 
dependency on Tii,q) of some terms appearing in criterion J, we choose to update this value by 
solving the fixed point equation 

N Q 

Vf>l,V( 7 GQ, Tii,q)cxaq^'^4>\iiY^j)^'^Ti)^ ( 7 ) 

3=1 z=i 

Our experiments show that this is a reasonable approximation (Section 0]). For the sake of com¬ 
pleteness, we provide in Section E] from Supplementary Materials the exact equation satisfied by 
the solution. 

Now parameter a. is not obtained from maximizing J as it is not a free parameter but rather 
the stationary distribution associated with transition tt. Thus, a. is obtained from the empirical 
mean of the marginal distribution fmarg over all data points 

^ T N 

VgeQ, = Tmargit, i, q) ■ 

t=l i=l 

Finally, optimization with respect to 7 depends on the choice of the parametric family {/(•, 7), 7 G 
F}. We provide explicit formulae for the most widely used families of conditional distributions on 
the edges (binary or weighted case) in Section |D] from Supplementary Materials. More precisely, 
we give these formulae for Bernoulli, finite space, (zero-inflated or deflated) Poisson and Gaussian 
homoscedastic distributions. 

Remark 1. Performing EM algorithm in HMM (Figure top) requires the use of forward- 
backward equations in order to deal with transition terms appearing in the complete data 

log-likelihood ®. In our setup, forward-backward equations are useless and replaced by a variational 
approximation. Indeed, it can be seen from Figure [U middle, that the conditional distribution of 
Zlf^Zjq given the data can not be computed exactly through such forward-backward equations. This 
is due to the fact that the variables Y* = {Yl}ij depend on all hidden ones Z\,..., and focusing 
only on Z\ is not sufficient to determine their distribution. 


y{q,q')&Q^ 

yq&Q, 
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Remark 2. In \ Yana et al\ H201A) . the authors derive a VEMprocedure in a similar (slightly less 
general) setup, but their variational approximation uses independent marginals (through individuals 
and also time po ints). As a conseq uence, the VE-step that they derive is more involved than ours 


(see Section 4 in \Ya,no et a,l\ . \20lA ) 


seam 

U). 


Model selection. Model selection on the number of groups Q is an important step. In case of latent 
variables, when the true data likelihood may not be easily computed, model selection may be done 


by maximizing an integrated classification likelihood (ICL) criterion ( Biernacki et al. . 200Clll . For 


any number of groups Q > 1, let be the estimated parameter value with Q groups and Z the 
corresponding maximum a posteriori (MAP) classification at 6q. In our case, the general form of 
ICL is given by 


ICL{Q) = log (Y, Z) - -Q(Q - I) log[7V(T - 1)] - pen{N, T, (3, 7 ), 


( 8 ) 


where the first penalization term accounts for transition matrix tt and pen{N, T, (3, 7 ) is a penalizing 
term for the connectivity parameters {(3,'j). As the number of parameters (/3,7) depends on the 
specific form of the family {/(•;7),7 G T}, we provide context dependent expressions for ICL in 
Section |D] from Supplementary Materials (along with the expressions of parameter estimates from 
the M-step for each case considered). Note that the first penalization term accounts for N{T — 1) 
latent transitions while the number of observati ons corresponding to SBM part of the parameter 
in pen{N,T, (3,^) will be different. We refer to Daudin et al. ( 2008h for an expression of ICL in 
static SBM that shows an analogous difference in penalizing groups proportions or connectivity 
parameters. Note that there are no theoretical results on the convergence of the ICL procedure 
(neither in simple mixture models nor in the SBM case). However the criterion shows very good 
performances on synthetic experiments and is widely used (see Section 2] for experiments in our 
setup). Nonetheless we mention that the criterion is not suited in the case of a finite space 
conditional distribution (see Example [5] in Section [D] from Supplementary Materials for more 
details). 


3.2. Algorithm initialization 

All EM based procedures look for local maxima of their objective function and careful initialization 
is a key in their success. For static SBM, VEM procedures often rely on a k-means algorithm on the 
adjacency matrix to obtain an initial clustering of the individuals. In our context, the dynamic 
aspect of the data needs to be properly handled. We choose to initialize our VEM procedure by 
running k-means on the rows of a concatenated data matrix containing all the adjacency time 
step matrices Y* stacked in consecutive column blocks. As a result, our initial clustering of the 
individuals is constant across time (namely Z) does not depend on t). A consequence of this choice 
is that this initialization works well when the groups memberships do not vary too much across 
time (see Sectionwhere we explore different values of transition matrix tt). In practice, real-life 
contact networks will either exhibit nodes that do not change group at all (see Section [S]) or nodes 
that leave a group and then come back to this group. Our initialization is performant in these 
cases. Another consequence is that while we would expect the performances of the procedure to 
increase with the number T of time steps, we sometimes observe on the contrary a decrease in 
these performances. This is due to the fact that increasing T also increases the probability for 
an individual to change group at some point in time and thus starting with a constant in time 
clustering of the individuals, it becomes more difficult to correctly infer the groups membership at 
each time point (see in Section 0] the difference between results for T = 5 and T = 10). 

To conclude this section, we mention that initialization is also a crucial point for oth e r met hods 
and we discuss in the next section its impact on the algorithm proposed in Yang et ^ (I2OII I. 


4. Synthetic experiments 

The methods presented in this manuscript are implemented into a R package and available at 
http: //Ibbe .univ-lyonl. f r/dynsbm. While the complexity of the estimation algorithm is 0{TQ^N^), 
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Table 1. Bernoulli parameter values in 4 
different cases, plus an affiliation example. 


Easiness 

/3ii 

/3i2 

1322 

low- 

0.2 

0.1 

0.15 

low-1- 

0.25 

0.1 

0.2 

medium- 

0.3 

0.1 

0.2 

medium-1- 

0.4 

0.1 

0.2 

med w/ affiliation 

0.3 

0.1 

0.3 


the computation time remains acceptable for networks with a few thousands of nodes (see Supple¬ 
mentary Figure ISTI) . 


4.1. Clustering performances 

In this section, we explore the performances of our method for clustering the nodes across the 
different time steps. To this aim, we will c onsider two different criteria. We rely on the adjusted 


dinerent time ste ps, lo tms aim, we will c onsider two dinerent criteria. We rely on tne adjusted 
Rand index (ARl lHiibert and Arabid . 198 !t[) to evaluate the agreement between the estimated and 
the true latent structure. This index is smaller than 1, two identical latent structures (up to label 
switching) having an ARI equal to 1. Note that it can take negative values and is built on Rand 
index with a cor rection for chance . Now there ar e two different ways of using ARI in a dynamic 
setup. Following I Yang et al. ( 20Il[l : Xu and Herol ( 2014 1. we first consider an averaged value over 
the different time steps 1 < t < T of ARR computed at time t. In this approach the dynamic setup 
may be viewed as a way of improving the node clustering at each time step over a method that 
would cluster separately the nodes at each time step. However, this averaged index does not say 
anything about the smooth recovery of group memberships along time. In particular, it is invariant 
under local switching on SBM part of the parameter (see Section [2131) • Thus we also consider the 
global ARI value that compares the clustering of the set of nodes for all time points with the true 
latent structure. Obviously, good performances for this criteria are more difficult to obtain. 

We use synthetic datasets created as follows. We consider binary graphs with N = 100 nodes 
and T G {5; 10} different time steps. We assume Q = 2 latent groups with three different values 
for the transition matrix tt 


rriow — 


0.6 

0.4 


0.4 

0.6 


1 rr medium — 


/0.75 

0.25\ 

(0.9 

0.1\ 

lo.25 

0.75/ 

; TThigh “ 1 Q 

0.9/ 


These three cases correspond respectively to low, medium and high group stability. Namely 
in the first case, individuals are more likely to change group across time, resulting in a more 
difficult problem from the point of view of the initialization of our algorithm (see Section [3?2l) . The 
stationary distribution in those three cases is a = (1/2,1/2) so that the two groups have similar 
proportions. 

As for the Bernoulli parameters /3, we explore 4 different cases (see Table 14.11) representing 
different difficulty levels, plus a specific example of affiliation for which we recall that parameters 
are not identifiable in the dynamic setting. We note that th is latter case s atishe s the separability 
condition established for sparse planted partition models Isee lMossel et ah . 2014 for more details). 
This means that static reconstruction of the groups is conjectured to be possible (and we consider 
that this static problem corresponds to a medium difficulty). 

For each combination of (tt,^), we generate 100 datasets, estimate their parameters, cluster 
their nodes and report in Figure [3] boxplots of a global and of an averaged ARI value. 

Figure [3] confirms that it is more difficult to obtain a smooth recovery of the groups (measured 
through global ARI) than a local one (measured through averaged ARI), the former values being 
globally smaller than the latter. In particular in the affiliation model, we observe that while the 
averaged ARI is rather good (all values close to 1), the global one can be low (for e.g. with low 
group stability or medium group stability and T = 10 time points). However in the identifiable 
cases, we obtain good performances for this global index (values above 0.8) when group stability 
is not too low or when connectivity parameters are well enough separated (medium f3 values). 
As expected, the clustering performances increase (i.e. ARI values increase) with group stability 
(from TT low to high) and with a better separation between the groups connectivity behaviors 
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low group-stability 


medium group-stability 


high group-stability 



beta-separability 


Figure 3. Boxplots of global ARI (white, left) and averaged ARI (grey, right) In different setups. From 
left to right: the three panels eorrespond to tt = (panels A,D), -rrmedium (panels B,E) and 

TThigh (panels C,F), respectively. In each panel, from left to right: results corresponding to /3 = 
low—, low+, medium-,medium+ and affiliation case, respectively. First row: T = 5 time points, second 
row: T = 10. 


(from low- to medium+ easiness). When increasing the number of time points from 5 to 10, 
clustering indexes tend to be slightly larger, exhibiting a smaller variance. However this is not 
always the case: for instance with low/medium group stability and j3 = low-\-, we observe that the 
performances decrease from 5 to 10 time points (smaller ARI values). We believe that this is due 
to the initialization of our procedure: with T = 10 time points, it is more likely that the groups 
membership differ from their initial value. As we use as a starting point a constant with time value 
for these memberships, our algorithm is farther from the optimal value. 


Mean squared errors (MSE) for estimation of the transition parameter tt are given in Supple¬ 
mentary Figure [SJ] We only show MSE for tt as the MSE for (/3, 7 ) are strongly correlated with 
the clustering results. This figure shows that when groups are not globally recovered, the MSE 
values may be high (up to 15%). However in most of the cases, these MSE are rather small (less 
than 2 %) so that the dynamics of the groups membership is captured. 


Now, we c ompa re our results with other procedures. The models f rom lYang et al. ( 2011 ): 
Xu and Hero ( 2014h are the closest t o our setup . Sin ce Xu and Herol ( 2014h obtained compa- 


rable perform ances as the ones from lYang et al.l (|201l[ l. we focus on the latter he re. (In fact , 
Xu and Hero ’s method is faster, with slightly lower clustering perf ormances th an Yang et ^ ’s 


one.) Thus, we use the offline version of the algorithm proposed in Yang et al. ( 20 111) (Matlab 
code is available on the web site of the first author). We ran their code on the same setup as 
above. When relying on default values of the algorithm, the results obtained are very poor, with 
ARI values smaller than 10“^ in general (data not shown). We note that the authors do not discuss 
initialization and simply propose to start with a random partition of the nodes, which proves to 
be a bad strategy. In order to make fair comparisons, we thus decided to combine their algorithm 
with our initialization strategy. Results are presented in Supplementary Figure [S3l 


From thi s figure, we can see that putting appart our initialization strategy, our procedure 
outperforms Yang et ^ ’s one (they globally have much lower ARI). Indeed, their method obtains 
good performances only in a few cases: (rrhigh^P € {medium-h; med w/ ajfiliation}^T € {5,10}) ; 
{'TThigh, P € {low-\-, medium— = 5) and {rrmedium, P G {medium-h; med w/ affiliation}, T = 5). 
In all these cases, we can see that the method’s performances are due to a very good initialization. 
Now, when the true classification is farther from initialization, the performances considerably 
drop. In particular, for intermediate cases (e.g. medium group stability or high group stability 
with T = 10), we can see th at our meth od still succeeds in obtaining a good partition (Figure [3]) 
while this is not the case for lYang et ahl ’s one (Supplementary Figure [S3]). 
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Selected number of groups 


Selected number of groups 


Figure 4. Estimation of the number of groups via ICL criterion. Left panei shows the frequency of the 
selected number of groups. Right panel shows ARI of the classification obtained with 4 groups depending 
on the selected number of groups. 


4.2. Model selection 

We simulate a binary dynamic dataset with Q = 4 groups, transition matrix between states satisfies 
TTqq = 0.91 and TTqi = 0.03 for q ^ 1. Bernoulli parameters are chosen as follows: we draw i.i.d. 
random variables {eqi}i<q<i<A S [—1,1] and then choose 

Vf? E Q, Pqq = 0.4 + eqqO.l, 

^ I G dqi = 0.1 + Cq/O.!. 

We generate 100 datasets under this model and estimate the number of groups relying on ICL 
criterion. Results are presented in Figure |H We observe that the correct number of groups is 
recovered in 88% of the cases (left panel). Moreover, the right panel shows that when ICL selects 
only 3 groups, ARI of the classification with 4 groups is rather low (less than 80%). This shows 
that in those cases, classification with 4 groups is not the correct one, so that VEM algorithm seems 
responsible for bad results (optimum has not been reached) more than the penalization term. 

5. Revealing social structure in dynamic contact networks 

Dynamic network analysis has recently emerged as an efficient method for revealing social structure 
and organization in humans and animals. Indeed, many studies are now beyond the analysis of 
static networks and take advantage of longitudinal data on the long term, for instance during 
days or years of observations, that allow for constructing dynamic social networks. In particular, 
contact networks built from field observations of association between animals or from sensors-based 
measurements, are now currently available in Ecology or Sociology. In this section, we show that 
our statistical approach is a suitable tool to analyze dynamic contact networks from the literature. 


5. 1. Encounters among high school students 

Describing face-to-face contacts in a population (in our case, a classroom) can play an important 
role in 1/ understanding if there is a peculiar non-random mixing of individuals that would be a 
sign for a social organization and 2/ predicting how infectious diseases can spread, by studying the 
crosslink between the contacts dynamics and the disease dynamics. As a first stone, it is therefore 
mandatory to find an appropriate model to analyse these contacts and we propose to use our 
dynamic SBM to achieve this step. 

The dataset consists in face-to-face encounters of high school st udents (measured through the 
use of wearable sensors) of a class from a French high school (see Fournet and Barrat , 120141 for 
a complete description of the experiment). In this class called ’PC’ (as students focus on Physics 
and Chemistry), interactions were recorded during 4 days (Tuesday to Friday) in Dec. 2011. We 
kept only the 27 (out of 31) students that appear every day, i.e. that have at least one interaction 
with another student during each of the 4 days. Interaction times were aggregated by days to form 
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a sequence of 4 different networks. These are undirected and weighted networks, the weight of an 
interaction between two individuals being the number of interactions between these 2 individuals 
divided by the number of time points for which at least two individuals interacted; thus a non 
negative real number that we call interaction frequency. After examination of the distribution of 
these weights, we choose to discretize these data into M = 3 bins (see Example [5] in Section m 
from Supplementary Materials) corresponding to low, medium and high interaction frequency. As 
already mentioned, our model selection criterion is not fitted to this case (see Section m from 
Supplementary Materials for more details). We thus choose to rely instead on the ’elbow’ method, 
applied to the complete data log-likelihood. It consists in identifying a change of slope on the 
curve that represents this complete data log-likelihood for different values of Q. The method 
selects Q = A groups (see Supplementary Figure IS^ and we now present the results obtained with 
our model fitted with Q = A groups. 

We observe that groups 2 and 3 are composed by students that are likely to interact together 
(i.e. 1322 and $33 are close to 1, see Figure E]). Furthermore, the frequency of their interactions 
inside their groups is higher than in the rest of the network Iffqqilow) < ■fqq{medium) < ^nnihigh) 
for Q = 2,3, same figure). These two groups form two communities such as defined in iFortunato 
( 2 OIOII . Moreover, we observe that both groups include a certain number of individuals (3 and 4 
respectively) that permanently stay in the group over time (see Figure E]). These individuals may 
play the role of ’social attractors’ or ’core leaders’ around which the other students are likely to 
gravitate. Group 4 displays a similar pattern of community structure, with much less interaction 
(intermediate value of fdn) but also a significant level of interaction with group 2 (Figure[5|). Inter¬ 
estingly, groups 2 and 4 also exchange students over time (see fluxes between groups in Figure El) 
and this could reflect some cooperation or affinity between the students of these two groups. Group 
I is quite stable over time (7 permanent members, see Figure El) and is characterized by a low rate 
of interactions inside and outside the group (low values in Figure El). It clearly gathers isolated 
students, but this does not mean that they do not interact with any student, they usually do so, 
but with a small number of partners. Therefore, we do not only decipher evolving communities 


(such as in lYang et al.l . l20IIri but we also highlight the dynamics of aloneness inside this class. 



Figure 5. Summary of the interaction pa rameters 0 and -y estimate d by our modei with Q = 4 grous on the 
dataset of interactions in the ’PC’ ciass iFournet and Barralil20l4 . In each ceil {q,l) with 1 < g < f < 4, 
there are T = 4 barpiots corresponding to the 4 measurements (Tuesday to Friday). Each barpiot represents 
the distribution of the parameter 7 *; for the three categories of interaction frequency {low, medium and high). 
The width of each barpiot is proportionai to the sparsity parameter /3*;. We recail that when considering the 
diagonai ceils (g, g), parameters do not depend on t anymore. 

We now investigate if gender differences may help in (a posteriori) explaining or refining the 
interaction patterns that we reveal. We first note that group 3 is exclusively composed by male 
students (Supplementary Figure lSSi) . This observation along with the previous conclusions suggests 
that group 3 may be a closed/exclusive male-community. Meanwhile, some of these male students 
move to group 1 which is partly composed by a ’backbone’ of female students that stay in group 
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Figure 6. Alluvial plot showing th e dynamics of the group rn embership estimated by our model on the dataset 
of interactions in the 'PC’ class iFournet and Barrail20l4 . Each line is a flux that represents the move of 
one or more students from a group to another group {Di - k indicates group k for day i). The thickness of 
the lines is proportional to the number of students and the total height represents the 27 students. 


1 (Supplementary Figure [S5]) . Moreover, we clearly observe that female students are likely to stay 
in their group (most of the moves between groups are realized by males), and that a majority of 
them are in low-interacting groups 1 and 4. But not any female student moves between these two 
groups, which supports a clear dichotomy pattern in the female organization with respect to male 
organ ization. In summary, we show evidence for some gender homophily fsee lFournet and Barrat . 
20141 for a precise definition), i.e. gender is a key factor for explaining the dynamics of the 


interactions between these young adults. 

Lastly, we note that both information captured by our model (say (3 and 7 ) are often conver¬ 
gent/correlated in this case, but we note that studying this network with a binary model (i.e. not 
considering the interaction frequency) does not allow to capture interesting structure (data not 
shown). Therefore, the presence/absence of an interaction as well as its frequency are important 
and require an explicit modelling such as in our approach. 


5.2. Social interactions between animals 

Interactions among animals are dynamic processes. How and why the topology of the network 
changes (or not) over time is of primary interest to understand animal societies. Here we analyze 
two datasets of animal contact networks, the first one dealing with migratory birds (sparrows) and 
the second with indian equids (onagers). Both datasets are analysed with the extended model 
presented in Section |E] from Supplementary Materials. 

Sparrows were captured and marke d during winters of 2 010-12 in a small area (The University 


of California, Santa Cruz Arboretum, IShizuka et al 


2014h . During these three seasons, Shizuka 
and colleagues recorded birds interactions (into flocks, i.e. individuals in the same place at the 
same time) and they aggregated their observations by seasons. They observed 69 birds in total, but 
there was a significant turnover of birds due to mortality and recruitment and only some of these 
69 birds are present at each season (31, 46 and 27 birds, respectively). The dataset is therefore 
composed by T = 3 undirected and weighted networks, with specified presence/absence of nodes at 
each of the three time steps. Edges are weighted by the number of times pairs of birds have been 
seen together at the same place and time (if zero, no edge). The authors identified re-assembly of 
same communities (as defined previously) across seasons despite the birds turnover. This stability 
is due to social preferences a cross years be t ween individuals that re-join the community located 


in the same area of the site ( Shizuka et al.l . 2014h . Our model is a perfect candidate to fit these 


observations: indeed, constraints from Equation ([5]) are appropriate in this case where the com¬ 
munities keep existing over time (and therefore the parameters remain stable over time) but the 
membership is evolving (in particular, due to the presence/absence of birds in the three seasons). 
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As previously, we discretized the edge weights into M = 3 bins {low, medium and high interaction 
level) and we selected Q = A groups with the ’elbow’ method (data not shown). Examination of 
the estimated parameters $ and 7 (Figure 0 left panel) reveals that groups 2 , 3 and 4 are clear 
communities (with different intra-group behaviors) that eventually correspond to those revealed by 
Shizuka et al. Most of that, our method proposes to gather peripheral birds into group 1. Clearly, 
we observe some stability across years with individuals staying in communities 2, 3 and 4 over 
time (see horizontal fluxes in Figure 0 left panel) and that are joined by incoming birds (see fluxes 
from the fake group 0 of absent birds in this figure). All these observations confirm the analysis in 


Shizuka et al.l (1201411 and demonstrate that our modelling approach is particularly suited to such 


datasets. 


Onagers were observed in the Little Rann of Kutch, a desert in Gujarat, India (iRubenstein et al 


201, ’ih . Each time a herd (group) has been encountered, association between each pair of individuals 


in the group was recorded. We retained the data association of 23 individuals present at least once 
between February and May 2003 and we aggregated interactions by month. The dataset contains 
therefore T = 4 undirected and weighted networks, with specified presence/absence of nodes each 
month. Edges are weighted by the number of times pairs of onagers belong to the same herd (if 
zero, no edge). Again, we discretized the edge weights into M = 3 bins {low, medium and high 
interaction level) and we selected Q = 3 groups. Visual inspection of the estimated parameters 
(Figure [3 right panel) shows that cluster 1 gathers peripheral onagers that can actually stay away 
from the others because preda tors have been ext i rpate d from this habitat (and so, no collective 


protection strategy is required, R.ubenstein et al. ( 201,5h l. Cluster 2 is composed by followers on¬ 


agers which have some interactions between them and much more with those of gr oup 3 whereas 


onagers in group 3 form a rich club community (i.e. clique of hubs as defined in IColizza et al 


2006h . with high values of estimates Pas, ■ysa{high). This community is evolving over time by in¬ 
tegrating one or two onagers during the successive months. Interestingly, the social integration 
process is revealed and somehow hierarchical: previously absent onagers (fake group 0 in Figure 0 
right panel) are likely to integrate group 1 , onagers of group 1 can possibly move to the followers 
groups (i.e. group 2 ), and a few followers can be integrated over time in the central rich club 
community (group 3). Again, t he structure of the onage rs social network remains persistent over 
time (see similar conclusions in R.ubenstein et ah . I201 !tI 1 and our model is therefore particularly 
adapted and efficient in this case. 


q=1 -1 q= 

2 q=3 

q=4 

1=1 

T(high) ■■■ 
7 (meclium) H H ■ 


l=2 

SI S2 S3 

m 

l=3 


T ■ 

m 


q=i 



Figure 7. Summary of the inter action parameters ,3 a nd 7 estimated by our modei with Q = 4 groups on the 
dataset of sparrows (lef t panei. lShizuka et ai.il20l4 and Q = 3 groups on dataset of onagers (right panei, 
Rubenstein et ai.Il201^ . respectiveiy. Same principie as in Figure0 
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Figure 8. Alluvial plot showing the dynamics of the grou p membership estima ted by our model 
on th e datasets of in t eractio ns between 69 sparrows (left, IShizuka et alJ j20l4 ) and 23 onagers 
(right, iRubenstein et~^ 1201 Sli ) respectively. Same principle as in Figure [6] (with Si - k, Mi - k indicat¬ 
ing group k at Season or Month i). A fake group (group 0) gathers absent animals at a specific time step and 
fuzzy fluxes represent arrival/departure to/from a group from/to group 0. 
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A. Counter example of identifiability when groups memberships and connectivity param¬ 
eters vary freely 

Here, we exhibit an example where the parameters are non identifiable when both groups mem¬ 
berships and connectivity parameters may vary across time without any constraint. 

Let us consider the case of Q = 2 groups and (for simplicity of notation) 2T time points. 
We fix a first parameter value 9 = ( 7 r,/ 3 , 7 ) defined hy tv = Id the size-two identity matrix and 
(/ 3 ‘, 7 *) = (/ 3 , 7 ) are chosen constant with t. In the following, we let (t>qi{-) denote the constant 
(with time) conditional density distribution of any Yl given = 1 , under parameter value 

9. The latent process has stationary distribution a = (1/2,1/2) and since the latent configuration 
is drawn at the first time point and stays constant (tt is the identity), it can be seen that the 
distribution on the set of observations Y is given by 

. 2T 

•’»(y) = 5]v H n 

qi---qNGQ^ ^<i<j<N 
. 2T 

9l...9iveQ" l<i<j<N t=l 


Now we consider a second parameter value 9 = (tt, 7 ) such that 


TV = 


which corresponds to the same latent stationary distribution a = ( 1 / 2 , 1 / 2 ) but now the latent 
configuration is drawn at the first time point and then each node switches group at each following 
time point. For any g G {1, 2}, we let q denote the unique value such that {q, g} = {1, 2}. Moreover, 
for any q G { 1 , 2 }, we set the within group parameter at time t = 1 to (/3qq,7qq) = iPqq^lqg), or 
equivalently, we set the conditional distribution of given = 1 , under parameter value 

9, equal to previous value (j)qq. Then, we switch the within group parameters values at each time 
point by setting 

Vt > 1, ^ 2 ^}" = 022 02^ = ^ 11 - 

Finally, the across group parameter is not modified through time and we set 0b = 0 i 2 - Now, we 
can write the distribution of Y under parameter value 9 

qi-.-qNSQ^ l<i<j<N 

^ n 4>qiqjiXij)4>qiqj0^ij) ■ ■ ■4>qiqj0^ij )) 

qi---qNeQ^ i<i<j<N 


so that FJ = ■ To conclude, it suffices to show that there is no global permutation cr G &q 

such that 9 = a{9). This can be seen from the fact that for any a G &q, we have ct(tv) = tt y tt. 
Thus the two parameters 9, 9 are not equal up to label switching while they produce the same 
distribution on the observations. It follows that the parameter 9 is not identifiable up to label 
switching. Note that the SBM part of the parameter is recovered up to local label switching as 
choosing the permutations a 2 t = Id and cr 2 t-i = ( 1 , 2 ) (the transposition in 62 ) for any 1 < t < T, 
we obtain that = ( 0 *, 7 ‘). 


I 



B. Non identifiability in affiliation case (planted partition) 


Identifying the whole parameters from a binary affiliation SBM is a di fficult task, as may b e seen for 
instance by the many different but always partial results obtained bv I Allman et al.l ( 201lh . In their 
Corollary 7, the authors establish that when group proportions are known, the parameters /3in (•— Pqq 
for all q) and /3out(:= Pqi for all q ^ Z) of a binary affiliation static SBM are identifiable. In the 
weigh ted affiliation case, a ll parameters (q:,/ 3 *, 7 *) of a (static) SBM may be identified (Theorem 
13 in lAllman et al.l . Imil)- Following the proof of Proposition [U we could identify (a, ^, 7 ) in 
dynamic affiliation SBM under natural assumptions. Now, without an additional constraint on 
the transition matrix tt, it is hopeless to identify the transition parameters. Indeed, as the groups 
play similar roles at each time step, label switching between different time steps is free to occur 
and TT may not be identified (note that assuming that or does not depend on t is of no 
help here). This may be seen for instance from the example constructed in Section El that remains 
valid in the affiliation case. In fact, identifying tt in dynamic affiliation SBM seems to be as hard 
as identifying the group proportions in static binary affiliation SBM. While static affiliation often 
relies on an assumption of equal group proportions, there is no simple parallel situation for the 
transition matrix tt in the dynamic case (the trivial assumption tt = Id is far too constrained). 
Let us now give some intuition on why tt is difficult to recover. For instance, following the proof 
of Proposition [T] and looking at the distribution of (Fj*, enables us to identify a mixing 

distribution with four components as follows. Let (resp. (5* 
mass at parameter (/3i„,7(„) (resp. (/^out)7out))- From the distribution of (Fj‘ ,FF 
the four following components 


o„t) be a shorthand for the Dirac 
‘ we identify 


qq' q l^m q^l m 

(EE OiqO^l'^qq'T^U’^^out ® '^out ' 

q^l q'=jtV 


Now relying on the knowledge of the proportions of each of these four components, it can be 
seen that it is not easy to identify the individual values of tt. Without a proper identification 
of the transition matrix tt, we do not recover the behavior of the group membership through 
time. Empirical evidence for label switching between time steps in the affiliation setup is given in 
Section 0] from the Main Manuscript. 


C. Optimization with respect to T(i, q) 

In this section, we provide the exact fixed point equation satisfied by the values T(i, q) maximizing 
J{9, t). We have 


Q 

T{i,q)(xag n 

t>2q2---qt 


( 


T^qt-iqt 

T{t,i,qt-I,qt) 


f( 2 ,i,q,q 2 )...f{t,i,qt-i,qt) 


X (j)t ^(Yt ymB.rg{t,j,l)f{2,i,q,q2)...f{t,i,qt-i,qt)^ 

t>2 q2...qt,l 


with the convention: whenever t = 2 then qt-i = q. This equation is to be compared with our 
approximation given by ©. 


D. Estimation of 7 and model selection: specific examples 

The M-step equations concerning 7 differ depending on the specific choice of the parametric family 
{/(•,7),7 G F}. We provide here many examples of classical choices for these parametric families. 
Remember that the resulting conditional distribution on the observations is a mixture between an 
element from this family and the Dirac mass at zero. We also provide expressions for ICL criterion 
in these different setups. 
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Example 1 (Bernoulli). This specific case corresponds to a degenerate family with only one 
element, the Dirac mass at 1, namely F{y,^) = dify). The parameter 9 reduces to (tt,/?) for 
which updating expressions at the M-step have already been given (see Proposition Hi. Note that 
we imposed the constraint 13^,^ constant with respect to t, for any q € Q. Now, model selection is 
performed through ([5]) where 


pen{N, T, (3, 7 ) = pen{N, T, (3) = log ^ t log . 

Example 2 (Finite space). Let us consider a finite set of M > 2 known values {ai,... jOm} 
not containing 0 and 

M 

f{y,l) = 7Mly=a„, 

m—1 

with 7 (m) > 0 and value 7 that maximizes J{0 ,t) with respect to 7 is given 

by 

w, w / , ^ ^2 w -t / N T.l<i^j<NTrnarg{t,l,q)Tmarg{t,j,l)lYf=a,n 
\/t,\/q I G Q ,\/m, = -- . -^-, 

'^marg(t, I, q)Tmarg(t, J, 1) ^Yf=am 


'iqGQ,'im, 7 ^^(m) = 


St=l ThlKiytjKN Tmargit, i, q)Tmarg{t,j, q)3Yf=a 

'''margit, i, q)Tmarg(t, j, g)lyt 


These equations remain valid when considering a set of disjoint bins {Im}m instead of pointwise 
values {am}m- 

In this setup, we do not propose a model selection criterion for selecting the number of groups 
Q. Indeed, our investigations show that a competition occurs between the number of bins M and 
the number of groups Q, so that in general we end up selecting only Q = 2 groups because of a large 
number of parameters (data not shown). In fact, this finite distribution setup may be viewed as a 
nonparametric model for which BIC-like criterion (ICL is of that type) are not suited. Section\^ 
from the Main Manuscript proposes another approach to handle this case, relying on the ’elbow’ 
method applied on the complete data log-likelihood. 


Example 3 (Poisson). We consider the truncated Poisson distribution 

/(j/,7) = (e^ - 1)”^^, ?/eN\{0}, 

resulting in either a 0-inflated or 0-deflated Poisson when mixed with the Dirac mass at 0. Let 


Vcc > 0, ipix) = 


- 1 


which is a strictly increasing function and as such admits a unique inverse function on 

(l,+oo). Note that has no simple analytic expression but for any fixed y € (l,+oo), the 

value X = Tnay be found numerically by solving for x the equation a;e^/(e“ — 1) — y = 0. 

Now the value 7 that maximizes J{9,t) with respect to 7 is given by 


yt,yq^lG 




'^margit, i, q)TYnarg(t, j, IfPi 


J '^maro(t, i, q)Xmara(t, j, PlyL^n ^ 


Vg e Q, %q = ( 


i)/Er=iE 


l<i^j<N ‘rnarg 


Tmargif, i, q)Tjnarg{t, j, g)E) 


Model selection is obtained by maximizing 

1 


pen{N,T,f3,‘j) =-(\{/3gq,qG Q}| + q'J log 


'^margit, i, q)Tmarg{t, j, 9) IpA ^0 

) with 

N{N-1)T\ 


- 


ll,l<q<l<Q,l<t<T}\ + 


Q(Q-I) 


T 


log( 


N{N-1) 
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Moreover, if (3*^1 = Pout does not depend on t and pqq = Pin, the penalty term in ICL beeomes 

. 1(2.«) . l(^r) i„, ( 


Example 4 (Gaussian homoscedastic). Let us consider the Gaussian distribution 


fiy,i) = 


\p2/K( 


1 ( {y-L?\ 


where 7 = (/i, ct^) G M x (0, +00). For parsimony reasons, we ehoose to consider the homoscedastic 
case where the variance is constant across groups and simply denoted by af. The value 7 that 
maximizes J(9,t) with respect to 7 is given by 


ypyq^iGQ^, 

y\i — 

yq&Q, 

Tqq — 

and 'it, 



E 


l<i^j<N 'rnarg 


{t, i, Q)Tmarg{t,j, OE? 


Eij Tmargit, i, Q)Tniarg(t, j, 


Et=l J 2 l<i^j<N '''marg 


(t, i, g)Tmarg{t, j, <7)^7 


'^marg{t, i, (])Tmarg{t, j, Cply* 


El<j<j<A El< 5 ,i<Q Tmarg{t, *, q)Tmarg{t, j, OK* 

Y2i,j,q,l 'Tmargit, i, q)Tmarg{t,j, l)3yt^^Q 
Here the remaining penalty term in for ICL criterion writes 

peniN.r,,.,) ^ l(2«.„g (^) 


E. Extension to varying number of nodes 

In the present work, we limited ourselves to the case where the list of nodes Ai} stays 

constant across time. However in real data applications it may happen that some actors enter 
or leave the study during the analysis. This may be handled in a simple way as follows. Let us 
consider V = {l,...,fV} as the total list of individuals and for each time step t, a subset V* of 

V with cardinality Nt of actors are present. Data is formed by a series of adjacency matrices 

Y = (Y‘)i<t<T where each Y* still has size N x N. For all pair of present nodes i,j G V*, entry 
Yl characterizes the binary or weighted interaction between i,j while for any i,j gV such that 
i pL V*, entry Yl is set to 0. Now, we construct the latent process Z = {Zl)i<t<T,i£V on an 
extended set Qa = Q U {a} where the extra value a stands for absent. For each time step t and 
whenever i G V*, random variable Y* is constrained to vary in Q while for any i p: V* we fix 
Z* = a. As previously, the random time series {Zpi^v are supposed to be independent while for 
each individual i G V , the sequence Zi = (Y*)i<t< 7 ’ forms an inhomogeneous Markov chain with 
values in Qa and transitions tt* constrained by, for all q, q' G Q, 

nl = nZl = a\Zl-^ = q) = l{tiV^}, 

TT*, = P(Y‘ = = a) = aql{i G V% 

= F{Zl = = q)= nqq,l{^ G E*}. 

Here, tt = {.'^qq')i<q,q'<Q stands as previously for a transition matrix on Q of an irreducible 
aperiodic stationary Markov chain with stationary distribution a. Note that the whole chain Zi 
is not stationary anymore. The probability of any trajectory of the latent process simply writes as 

N T 

F{Z) = Y[F{Zl)l[F{Zl\Zl-^)=l[a^^x ^ 

i=l t=2 qeQ q,q'eQ 
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where 


T 

n,= Y, i{zl = q} + Y. Y. = g}, 

T 

and N,,,=Y E HZY=q,Zl = q'}. 
t=2 ieyt-iny* 

As such, a node that would not be present at each time point contributes to the likelihood only 
through the part of the trajectory where it is present. Moreover, given the latent groups Z, for 
any i,j G V*, the conditional distribution of is still given by © while whenever i gV ^ 

we have 11* is deterministic and set to 0. Thus, a node absent at time t does not contribute to the 
likelihood of the observations. Generalization of our VEM algorithm easily follows. 

F. Supplementary figures 
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Figure S1. Boxplots of elapsed time (in seconds) for the estimation algorithm (including initialization) on 50 
simulated datasets in the Bernoulli case with /3 = medium+ and high group stability (see Section|4]in Main 
Manuscript), for N = 500,1000 and Q = 5,10. Performed on Intel Xeon E5-1620 v2 (8 threads, 3.70GHz). 
The stopping criteria is set to a relative difference on J values of 1 x lO”"*. 
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Figure S2. MSE boxplots for estimation of transition matrix n in different setups. From left to right: the three 
paneis correspond to tt = -kio-w (paneis A,D), nmedium (paneis B,E) and nhigu (paneis C,F), respectiveiy. In 
each panel, from left to right: results corresponding to /3 = low—, low+, medium—, medium+ and affiliation 
case, respectively. First row: T = 5 time points, second row: T = 10. 
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medium group-stability 


high group-stability 
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Figure S3. Boxplots of global ARI (white, le ft) and ave raged ARI (grey, right) in different setups for the 
combination of our initialization strategy with lYana et al.t s algorithm. From left to right: the three panels 
correspond to tt = mow (panels A,D) 1 '^medium (panels B,E) and -jThigh (panels C,F), respectively. In each 
panel, from left to right: results corresponding to /3 = low—, low+, medium—, medium+ and affiliation case, 
respectively. First row: T = 5 time points, second row: T = 10. 
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Figure S4. Complete data log-likelihood estimate d for different numbers of groups on the dataset of interac¬ 
tions in the ’PC’ class jpournetand Barralll20l4 . 



Figure S5. Same as in Figure [6] frem Main Manuscript for the 12 female students (left panel) and the 15 
male students (right panel). 
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